A Trainable Object-Tracking Method using Equivalent Retinotopical Sampling and Fisher Kernel

نویسنده

  • Hirotaka Niitsuma
چکیده

In this paper, two object detection techniques in computer vision are proposed. The first method is a trainable objecttracking method, based on maximum likelihood. The second method is an extension of support vector machines (SVMs). The first method is an extension of Retinotopical Sampling (RS). RS is a Gaussian filter with object detection mechanism. The concept of RS was inspired by human saccadic eye movements. However, when the object size is inferred by RS the result tends to gravitate towards zero. In this paper, Equivalent Retinotopical Sampling(ERS), which is an extension of RS, is proposed. ERS is reformulated RS by introducing an amount of information from each sampled point. The second method is an extension of discriminant function trained by SVMs for object recognition in an image. The discriminant function is formulated as an analytical function of the object position and the object size in an image. The extension is introducing ERS to SVMs. Introduction Support vector machines (SVMs) have yielded good generalization performance on wide range of problems. In the pattern recognition field, SVMs have been applied to isolated handwritten digit recognition, object recognition, speaker identification, charmed quark detection, face detection in images, and text categorization. In most of the applications, SVM’s generalization performance either matches or is significantly better than that of competing methods. In this paper, an extension of a discriminant function trained by SVM for object recognition in an image is suggested. By this extension, the discriminant function is formulated as an analytical function of the object position and the object size in an image. This extension realizes a trainable object-tracking method as gradient decent method for the discriminant function like figure 8. The extension is introducing a concept of Retinotopical Sampling (RS)(Smeraldi & Bigun 2002) to SVMs. The concept of RS was inspired by human eye mechanism(Smeraldi & Bigun 2002). Using the concept of RS, a statistical object model is defined. Then, Fisher Kernel(Jaakkola & Haussler 1999) using this statistical model is defined. This kernel function is an analytical function of the object position and Copyright c © 2003, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. the object size. Then, the discriminant function becomes an analytical function of the object position and the object size. Statistical Object Model In this section, the statistical object model is described. And, a maximum likelihood based trainable object-tracking method is discussed. The concept of RS was inspired by human saccadic eye movements and enables a quick recognition of eyes, mouth in static images(Smeraldi & Bigun 2002). RS is a Gaussian filter with object detection mechanism. Tao et al. formulated object-tracking as a similar model (Tao, Sawhney, & Kumar 2002), with the object position represented by Gaussian prior distribution. This model enables estimation in a maximum a posteriori (MAP) framework using a generalized expectation-maximization (EM) algorithm. Moreover, RS has a high calculation speed. In this paper, instead of Gaussian prior distribution, RS is used. When the object size is inferred by RS the result tends to gravitate towards zero. To avoid this difficulty, Equivalent Retinotopical Sampling(ERS), which is an extension of RS, is proposed. ERS can infer the size of objects more accurately. Using ERS, a trainable static object-tracking method, with essentially only two parameters, has been formulated. Notation and Model In this paper, an image is represented as a following set. I = { (x1, y1, i1, ∂i1 ∂x ), (x2, y2, i2, ∂i2 ∂x ), ... } = {X1,X2, ...} (1) Xn = X(xn) = (xn, in, ∂in ∂x ) X(x) = (x, i(x), ∂i ∂x (x)) Here, xn = (xn, yn) denotes the coordinates of the nth pixel, in = i(xn) denotes intensity of the nth pixel. ∂in ∂x = ∂i ∂x (xn) is the intensity gradient at xn. X denotes a state of a pixel at x. Xn denotes the state of nth pixel. The designated objects (for training) are represented by the following Gaussian mixture distribution for the state of a pixel p(X|Θ). p(X|Θ) = M ∑ k=1 pkN5 (X; ςk,Σk) (2) Θ = (ς1,Σ1, ...ςM ,ΣM ) Nl (x; ς,Σ) = l dimensional normal distribution Nl (x; ς,Σ) = 1 √ (2π) |Σ| exp (− (x− ς)Σ−1 (x− ς) /2) Here, parameter Θ is determined to give the maximum likelihood for all pixels in the designated images for training. Θ is determined by the method Verbeek et al.(Verbeek, Vlassis, & Krose ) proposed. In the model Tao et al.(Tao, Sawhney, & Kumar 2002) proposed, an appearance model as p(i|n : pixel number) is used. Because, a simple Gaussion mixture model

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using a Novel Concept of Potential Pixel Energy for Object Tracking

Abstract   In this paper, we propose a new method for kernel based object tracking which tracks the complete non rigid object. Definition the union image blob and mapping it to a new representation which we named as potential pixels matrix are the main part of tracking algorithm. The union image blob is constructed by expanding the previous object region based on the histogram feature. The pote...

متن کامل

Visual Tracking using Kernel Projected Measurement and Log-Polar Transformation

Visual Servoing is generally contained of control and feature tracking. Study of previous methods shows that no attempt has been made to optimize these two parts together. In kernel based visual servoing method, the main objective is to combine and optimize these two parts together and to make an entire control loop. This main target is accomplished by using Lyapanov theory. A Lyapanov candidat...

متن کامل

A Non-Parametric Trainable Object-Detection Model Using A Concept Of Retinotopic Sampling

A retina has a space-variant sampling mechanism and an orientation-sensitive mechanism. The space-variant sampling mechanism of the retina is called retinotopic sampling (RS). With these mechanisms of the retina, the object-detection is formulated as finding appropriate coordinate transformation from a coordinate system on an input image, to a coordinate system on the retina. However, when the ...

متن کامل

A Geometry Preserving Kernel over Riemannian Manifolds

Abstract- Kernel trick and projection to tangent spaces are two choices for linearizing the data points lying on Riemannian manifolds. These approaches are used to provide the prerequisites for applying standard machine learning methods on Riemannian manifolds. Classical kernels implicitly project data to high dimensional feature space without considering the intrinsic geometry of data points. ...

متن کامل

Fisher Discriminant Analysis (FDA), a supervised feature reduction method in seismic object detection

Automatic processes on seismic data using pattern recognition is one of the interesting fields in geophysical data interpretation. One part is the seismic object detection using different supervised classification methods that finally has an output as a probability cube. Object detection process starts with generating a pickset of two classes labeled as object and non-object and then selecting ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003